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Abstract 

We examine the complexity of learning the distributions produced by finite-state quantum 
sources. We show how prior techniques for learning hidden Markov models can be adapted 
to the quantum generator model to find that the analogous state of affairs holds: information- 
theoretically, a polynomial number of samples suffice to approximately identify the distribution, 
but computationally, the problem is as hard as learning parities with noise, a notorious open 
question in computational learning theory. 

1 Introduction 

In recent work, Wiesner and Crutchfield [15] introduced Quantum Generators as a formal model 
of simple quantum mechanical systems. In this model, a simple quantum mechanical system is 
observed repeatedly, yielding a classical stochastic process consisting of the sequence of discrete 
measurement outcomes, analogous to how an underlying Markov process yields a sequence of ob- 
servations in a hidden Markov model. From this perspective, it is natural to wonder what can 
be learned about such a simple quantum mechanical system from the sequence of measurement 
outcomes. 

In this work, we consider the question of whether or not it is feasible to learn the distribution 
on measurement outcomes from a reasonable (polynomially bounded) number of observations. We 
state two theorems on this subject: first, in Section [3l we show that it is information-theoretically 
possible to learn the distribution over measurements for binary processes in polynomially many ob- 
servations, but we then show in Section H] that under a standard hardness assumption (Conjecture!!! 
that it is computationally infeasible to learn parity functions in the presence of classification noise) 
that it is also computationally infeasible to learn the output distribution of a Quantum Generator 
(also for a binary alphabet). 

2 Preliminaries 

We begin by recalling the formal definition of Quantum Generators (specialized to binary observa- 
tions here) and the models of learning that we will need. 

* Supported by a NSF Graduate Research Fellowship. 
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2.1 The Quantum Generator Model 

Quantum Generators, denned by Wiesner and Crutchfield [15], are a model of a simple, repeatedly 
observed quantum mechanical system. Formally: 

Definition 1 (Quantum Generator) A fc-state Quantum Generator is given by a four-tuple, 
(\ipo)> U, M, S) where the initial state \ifio) £ C fc has l2-norm 1, U is a unitary transformation on 
C k , S is a finite set of measurement outcomes, and M is a projective measurement operator, i.e., 
there is a partition of {1, . . . , k} into |S| sets such that associated with each a G £, there is a 
projection M a onto the associated coordinates. 

A Quantum Generator produces a probability distribution in the following way: given \ipt), f or 
each a E S, xt+i = o and \ipt+i) = | | M <T ;7 | !/f t ') || 2 with probability \\M a U\ipt)\\2- Thus, in particular, 
the probability of the n-symbol output x\, . . . , x n S S n is given by \\M Xn U ■ ■ ■ M Xl U\ipo)\^. 

In this work, we will only consider measurements with two output symbols. Thus, in general (if the 
system has more than two basis states), we only consider degenerate measurements. This is, of 
course, with some loss in generality, but it also means that the hardness result in Theorem [5] holds 
even for a highly restricted class. 

From a theoretical perspective, it is also natural to wonder if it is necessary to link the output 
distribution and measurement of the quantum system - and certainly, proposals for formal models 
that do not identify these two concepts exist in the literature [131 [7] - but in their work, Wiesner and 
Crutchfield stress that the resulting (alternative) models do not capture simple physical systems. 
Since we wish to strive for relevance in this case, we adopt the model of Wiesner and Crutchfield 
here. Again, we also stress that our negative result holds even for this more restricted class of 
(physically relevant) processes. 

We also remark that we allow our Quantum Generators to start in an arbitrary state and in 
the model of learning distributions that we consider, we assume that it is possible to take many 
independent samples from this distribution. This is arguably unrealistic, but we note that the 
hardness result is likely to be more relevant to practice, where the construction we use in our 
hardness result turns out to have two desirable properties: first, it starts in a basis state (i.e., of 
the form ej), and second, the mn-symbol distribution of the Quantum Generator is distributed 
identically to m independent copies of the n symbol distribution, so we also have hardness for 
learning from a single, long sample as well. For more details, consult Appendix iBl 

2.2 Models of learning distributions 

In contrast to the classic PAC model, and in contrast to the approach taken by Abe and Warmuth 
in their treatment of probabilistic automata [I], our positive and negative results will all be given for 
the representation-independent "improper PAC" distribution-learning model introduced by Kearns 
et al. [9|. Specifically, we use their notion of learning with an evaluator: 

Definition 2 (Distribution learning under the KL-divergence) We say that a class of dis- 
tributions T> is learnable under the KL-divergence in m samples (time complexity t) if there is an 
algorithm that, on input n, e, 5, and xi, . . . ,x m £ {0, l} n sampled from D n for D = {D n } n an 
ensemble from T>, outputs an "evaluator" circuit E : {0, l} n — > [0, 1] (within t steps) such that the 
distribution on {0, l} n computed by E satisfies KL(D n \\E) < e with probability 1 — 5. 
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We will comment explicitly on the time efficiency of the learning algorithm and number of samples 
m, as appropriate. In particular, if m is an appropriate polynomial (in n, -, log ^, and in our case 
also k, the number of states), this corresponds to improper PAC-learning, and if t is an appropriate 
polynomial (in the same parameters) then learning is said to be efficient. 

We also use a hardness of learning assumption, which depends on the definition of learning in 
the presence of noise [2]: 

Definition 3 (Learning in the presence of noise) We say that a class of boolean functions C 
is efficiently learnable under the uniform distribution with noise rate rj if there is an algorithm that, 
on input n, s, 5, and rj, when given • • • , x m uniformly chosen from {0, 1}^ and &i, . . . , b m where 
each hi = f(xi) for a fixed f £ C, with probability 1—rj independently, with probability 1 — 5 outputs 
the representation of a function f such that Pr^g/QU^ [f{x) ^ f'(x)] < e, in time polynomial in n, 
\, and log \. 



3 Improper PAC-learnability 

In this section, we adapt the approach used by Abe and Warmuth [lj to show that (classical) 
probabilistic automata are PAC-learnable to show that the distributions produced by Quantum 
Generators are improperly PAC-learnable under the KL-divergence. 

Following Kitaev, we employ the set of gates {I, S, K, 0, A ffi } where I is the identity gate, S = 

( J ^ ^ is a scaled Hadamard gate, iT = ^ J ^ ^ is a phase shift, ©(|a,&)) = \a, a © b), 

and A©(|a, b, c)) = |a, b, (a Ab)(Bc) is a Toffoli gate. We first recall the Solovay-Kitaev Theorem [11] 



Theorem 1 (Solovay-Kitaev) For any 5 > and n-qubit unitary U, there is a 0(2 2n (n + poly log |)) 
gate £2 5 -approximation to U in our set of gates. 

In particular, since a /c-state quantum generator has a unitary with a log/c-qubit representation, 
we find: 

Claim 2 There is an e-net under the £oq distance on the n-symbol output distributions of k-state 
Quantum Generators of size 2 pol y^ n ' l °^^ 

The key of Abe and Warmuth's analysis was that for any distributions P and Q, the KL- 
divergences of the empirical distributions P n from Q n , KL(P n \\Q n ) converge to KL(P n \\Q n ) (essen- 
tially by Hoeffding's inequality) where we can calculate the former quantity for a given distribution 
Q from our e-net. At this point, the learning algorithm is essentially obvious; the only problem is 
that the KL-divergence is infinite for strings outside the support of a distribution from our e-net, 
which would prevent the use of the concentration result. We avoid this by perturbing the distribu- 
tions slightly: in the distribution over n-symbol samples, we fix the minimum probability that any 
symbol is output on any step to (roughly) e/n (altering the remaining probabilities accordingly). 
It is easy to see that this guarantees an upper bound on the KL-divergence (between our modi- 
fied distribution and any distribution over n symbol strings) of nlog ~. Taking (again, roughly) 
e = (e/2n) 2n , we can show that for the distribution D we obtain from our perturbed approximation 
to a distribution D obtained from a Quantum Generator, the total KL-divergence from D is at 
most e. Note that the elements of the e-net still have representations of size polynomial in n since 
the dependence on e was only polylogarithmic. Thus, we find: 
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Theorem 3 The class of k-state Quantum Generators is learnable under the KL-divergence with 
sample complexity poly(n, k, -, log $). 

The full proof is given in Appendix [Al 

4 Computational hardness of learning 

We now show the computational hardness of learning the output distributions of Quantum Gener- 
ators, under the assumption that learning noisy parity functions is hard. More specifically, we say 
that a function / : {0, 1}™ — * {0, 1} is a parity function if there is some S C {1, ... ,n} such that 
f{x) = (Bjgs x »' and we assume that it is hard to identify the set S when we are given random 
examples of / with f(x) negated with some probability rj. Formally, the assumption is: 

Conjecture 4 (Noisy Parity Learning) There is a constant r) G (0,1 /2) such that no algo- 
rithm learns the class of parity functions with noise rate n under the uniform distribution in time 
polynomial in n, -, and \. 

These functions are known not to be learnable in the restricted statistical query model [H 
[3], which captures most known algorithms for efficient learning in the presence of classification 
noise, although the best known algorithm for the problem, due to Blum, Kalai and Wasserman [5] 
efficiently learns parities up to size O (log n log log n), which is beyond what can be learned in 
the statistical query model. (For parities of @(n) bits, however, the algorithm requires 2^( n / logn ) 
samples.) Feldman et al. [6j recently showed that many other problems not known to be learnable 
in the presence of classification noise reduce to the problem of learning noisy parities, establishing 
its central place in the classification noise model. Moreover, this problem is related to the long- 
standing open problem of decoding random linear codes [4], and worse still, Feldman et al. show 
that learning parities with random noise is as hard as learning parities in the agnostic learning 
(adversarial noise) model [10J. Thus, in any case, it represents a serious barrier to the current state 
of the art, and any algorithm for our problems of interest would represent a major breakthrough 
on numerous fronts. 

The result proceeds, simply enough, by showing that a Quantum Generator of modest size 
(linear in n) can produce exactly the distribution of labeled examples of a parity function with 77 
noise, where learning the distribution of the parity function is sufficient to learn the parity. The 
construction is a modification of the analogous constructions for probabilistic automata and hidden 
Markov models given by Kearns et al. and Mossel and Roch, respectively [H[l2j. Our construction 
is illustrated in Figure [TJ The result is: 

Theorem 5 Assuming the Noisy Parity Learning Conjecture, no algorithm can learn the n-bit 
output distribution of a k-state Quantum Generator under the KL-divergence in time polynomial 
in n, k, |, and log |. 

The proof is given in Appendix [Bj 
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Figure 1: A 4(n + l)-state QG generating a noisy parity of S = {3, 5} for n = 5. Circles correspond 
to states with labels indicating which partition they belong to under the measurement operator; 
unlabeled transitions come in pairs with weig hts l/y/2 and i/y/2. 
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A Proof of improper PAC-learnability 

For convenience, for a distribution P on {0, l} n and sample x £ {0, l} n , we define Pi{x) = 
P(xi\xi, . . .,Xi-i). Thus, P(x) = UiPiix). 

Proof of Claim [2j Fix a measurement operator M on a quantum system with k basis states, 
and consider the Quantum Generator with a unitary U and starting state |^o)- Consider the 
poly(/c,log ^)-gate approximation to U, U' , given by the Solovay-Kitaev Theorem, and a 2/clog ~ 
bit approximation \i/j'q) to |^o) with representation (b%, . . . , bf.) corresponding to the normalization 
of the vector 




noting that (l — E o og e o < ^. We therefore see that |^o) has an approximation \ip' ) such that 
each entry is within a multiplicative (1 — ^-factor unless it is smaller than ^, so that in either case, 
the £2 distance between |^o) and \ijj' ) (recalling that \tpo) has £2 norm 1) is at most 2eo- Noting 
that at each step, the probability of x\, . . . ,xi is equal to the £\ norm of M X JJ ■ ■ ■ M xl U\ip), it is 
easy to see that each application of U' now grows the gap between P(x) and P'(x) by at most eo, so 
the total gap between P(x) and P'(x) is at most (n + 2)eo- Since M has a fc-bit representation and 
U' has a poly(n, k, log ^)-bit representation, clearly the overall size of the e-net (taking eo = 
ig 2 poi y (n,fc,iogi) ) as c i a i me d. □ 
For a fixed e±, given a distribution P and observation x, we define the perturbed distribution 
P(x) (and associated "corrected" observation x) as follows: if P(x\) < e\, then P{xi) = e\ and 
similarly, P{x\) = 1 — e\ whenever P{x\) > 1 — e%; if P(x\) = 0, then x,\ = ~^x±, otherwise, we put 
x\ = x\. If, on the other hand, 1 — e± > P(xi) > e\, P{x\) = P(x\). Now, assuming that we have 
defined x\,. . . , Xi-\ and P{x\), . . . , P{xi-\), we similarly define X{ to be Xi if P{xi\x\, . . . , Xj_i) 7^ 
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and -<Xi otherwise; finally, as before, we put Pi(x) equal to P(xi\xi, . . . ,Xi—i) "restricted" to the 
range [e x , 1 - e x ]. 

It is easy to see that P is a probability distribution over {0,1}™. Moreover, suppose P' is a 
distribution such that \P'(x) — P(x)\ < £2 for all x (e.g., as obtained via Claim [2]). We then have 
that P' > e n and P-(x) < P((x) only when P((x) = 1 — e±, and thus 



KL(P\\P') = J2P(x)J2lo g 2jd 



< E n 

x:P(x)>e! | l 

< P(x) log 

a;:P(a;)>eJ' 

^1 — ^2 



P fx) 



P{x) 

£2 



log 



i:l-ei<f/(a:) 
1 



1-ei 



+ nlog 



1-ei 



+ n log 1 + 



1-ci 



< 



C2 



£2 



+ ne\ 



so if we take e n — £2 = v 7 ^' KL { p \\P') < + nej /2n (l + Vei) 17 ™- Thus > for a desired e , taking 
£2 = (eo/2(n + l)) 2n suffices to give KL{P\\P') < eq. Moreover, the size of the 62-net is still 

2 poly ( n ' fc ' log sq> (with a larger dependence on n) and since P' > e™, for every distibution Q over 
{0,1}™, we find 

KL(Q\\P') = V log i - (Q) < V Q(x) log 1 = n log - < n log 2( " + 1) 



£l 



to 



We now recall the following standard lemma used by Abe and Warmuth pQ, following from 
Hoeffding's inequality. (They reference Pollard |14j.) 

Lemma 6 Let T be a finite set of random variables with range bounded by [0, M] . Let D be an 
arbitrary distribution. Then, if 



M 



1. 



m > —5- (In \ J : \ + In — 

E A 



we have 



Pr 

xi,...,x m eD 



-£/o> 



Ed[/] 



> e 



< 5 



Naturally, if "P is the set of perturbed distributinos from our 62-net, we apply this lemma with 
T = {log -p : P' £ V}. Thus, In \F\ = poly(n, k, log i) and M = nlog . We also use Eo as e, 

for convenience. 

For the corresponding polynomial number of samples we find, following Abe and Warmuth, that 
for the true distribution P, its perturbed estimate P', and any perturbed distribution P* acheiving 
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the minimum value of — ^jlog p*( x ) ' w ith probability 1 — 5, the following simultaneously hold: 

Ep[10g ^ ] -^ 10g p4o <£0 

i 

— log Epilog — 1 < £ 

m*f P'(xi) 1 P' J 

1 \ - 1 1 \ - 1 

— /L lo § r>*f \ X, lo § fs// \ ^ 

m t—' P (Xi) m ^— ' P (xi) 

by summing the three, we find 

Ep[log-^]-E P [log-i]<2£ 

so therefore KX(P||P*) - iTL(P||P') < 2e - Since we argued above that KL(P\\P') < e , we find 
that KL(P\\P*) < 3eq, so by taking eo sufficiently small, we see that it is sufficient to output a 
circuit corresponding to this P*. Since evaluating P* from its gate construction merely involves 
performing a polynomial number of matrix operations to polynomial precision, Theorem [3] follows. 

B Proof of computational hardness 

Let any parity function fg and any noise rate rj G (0, 1/2) be given. Following the constructions 
of Kearns et al. [9] and Mossel and Roch |12| . we describe a 4(n + l)-state Quantum generator for 
which the (n+l)-symbol output distribution is precisely the noisy parity distribution — (x, fs{x)(Bb) 
where x E {0, l} n is uniformly chosen and b G {0, 1} has 6=1 with probability r\. 

Construction: For convenience, we will index the basis states by (j, k, £) £ {0, 1, . . . , n} x {0, 1} x 
{0, 1}, where (cf. Figured]) we think of j as representing a column, k = 1 as representing the "top 
half," and £ = 1 as representing the "upper state." We will explicitly describe the entries of the 
matrix representation of the Quantum Generator's unitary. (Verifying next that the matrix actually 
describes a unitary transformation, of course!) 

For each column (j,k,£), there are exactly two nonzero entries, each in rows of the form (j + 
1 mod n + 1, k' , £'). For j = 0, . . . , n — 1, if (j + 1) ^ S, then the nonzero entries are 1 /\/2 in 
(j + 1, k,£) and ijypi in (j + 1, k,£ © 1); if (j + 1) = min(S'), then the nonzero entries are l/\/2 in 
(j + 1, k, £) and i/V2 in (j + 1, k © 1, ^); and, if (j + 1) £ S but it is not the minimum element, 
then the entries are l/y/2 in (j + 1, k © £, k) and i/\/2 in (J + 1, 1 © k ®£, k). Finally, if j = n, then 
the nonzero entries are ^/l — r\ in (0, k,£) and i^/rj in (0, k © 1,^). We further observe that each 
row also has exactly two nonzero entries, one in column (j, k,£) with zero complex part and one in 
column (j, k' ,£') with zero real part; moreover, these two columns appear together in the support 
of another row, with column (j, k,£) having zero real part and (j, k' ,£') having zero complex part. 

Claim 7 The linear transformation corresponding to this matrix is unitary. 

Proof: To see that this matrix is unitary, it suffices to show that the £2 weight from entries with 
index j is preserved in the entries with index j + 1 (mod n+1) after application of the corresponding 
transformation. Let any vector in C 4 ( n+1 ) be given; we decompose its entries into real and complex 
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part, u(j,k,£) + iv(j,k,£). For j ^ 0, suppose that the two nonzero entries in row (j,k,£) are 
columns (j — 1, k' , I') and (j — 1, A;", where the former has weight with zero complex part, and 
the latter has zero real part. Then, the output entry (j,k,£) is 

-j=(u(j - 1, k',£') - v(j - 1, k" ,£")) + ±=(u(j - 1, k",£") + v(j - 1, fc',0) 

so its contribution to the £2 weight is 

i(Kj - l,k',£')-v(j - l,k",£")) 2 + (u(j - l,k",£") + v(j - l,k',£')) 2 ) 

where, in the other row with columns (j — 1, k', £') and (j — 1, k", £") in its support, the contribution 
to the £2 weight is 

\{{u{j - 1, k",£") - v(j - l,k',£')) 2 + (u(j - l,k',£') + v(j - l,k",£")) 2 ) 
and therefore, summing over these rows gives that the entries with index j — 1 yield £2 weight 

J2(u(j-l,k,£) 2 +v(j-l,k,£) 2 ) 

in entries with index j (again, for j / 0) of the output. We also similarly find, for j = 0, that the 
output entry (0, k, £) is 

(a/1 - r]u(n, k, £) - ^/rjv(n, k © 1,£)) + i(y/rju(n, k © 1, £) + y/l - rjv(n, k, £)) 

so its contribution to the £2 weight is 

(1 - r])u(n, k, £) 2 - 2^(1 - rjju(n, k, £)v(n, k © 1, £) + rjv(n, k®\,£) 2 

+ r]u(n, fc 1, £) 2 + 2^(1 - r?)u(n, feel, ^)w(n, fc, £) + {!- rj)v(n, k, £) 2 

where, summing over (0,0, £) and (0, 1,£) gives 

u(n, 0, £f + v(n, 0, + u(n, 1, £) 2 + v(n, 1, ^) 2 

and hence, summing over all (j, fc, £) in the output, we observe that the £2 norm is indeed preserved, 
so the linear transformation is unitary. □ 

Choice of measurement and start state: We let the Quantum Generator's measurement 
operator be as follows: for j ^ {0} U S, the basis states of the form (j, k, b) are in the basis of 
the subspace corresponding to the outcome b; for j £ S — {min(S*)}, the basis states satisfying 
(j,£(Bb, £) are in the basis corresponding to the outcome b; and otherwise, the basis state (j, b, £) is 
in the basis of the subspace corresponding to the outcome b. We take our start state to be the basis 
state (0,0,0). By the previous claim, this is a 4(n + l)-state Quantum Generator, as promised. 
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Correctness: We are now in a position to verify that the (n + l)-symbol output distribution of 
the constructed Quantum Generator is the distribution of noisy random labeled examples of fs- 

Claim 8 Each {ipt} is of the form pe^j^^ where &u k,l) * s a vec tor corresponding to the basis state 
(j, k,£) and p G C satisfies \p\ = 1. 

Proof: This claim is easy to verify by induction on t: assuming it is true of \ipt), we see by 
inspection that the two entries in the support of column (J, k, £) in our matrix correspond to 
different measurement outcomes, so the projection selects exactly one of them for \ipt+i)- D 

Claim 9 For t = j (mod n + 1), such that j > min(S'), the Quantum Generator is in a basis state 
(j,k,£) where k = ® t ':t'>t~j,t' (mod n+i)eS x t'- 

Proof: Note first that if t = min(S) (mod n+ 1), then @ t '-t'>t-j t' (mod n+l)&S x V = x t- Thus, 
since by Claim [8] \iftt-i) was a basis state, by construction we obtain xt = 1 if is supported 
by (t mod n + 1,1, 1) and x% = when \if) t ) is supported by (t mod n + 1, 0, I). Suppose then for 
induction that this holds up to t (mod n + 1) — 1 > min(5). Then, if t (mod n + 1) £ S, we see that 
by construction, if k = bo and xt = b, \ipt) is supported by the basis state (t mod n + 1, 6q b, bo), 
as needed. Otherwise, \ifit) is supported by the basis state (t mod n + 1, bo, b) so in any case, the 
claim holds. □ 
We now observe that for t ^ (mod n + 1), by Claim[8]and further inspection, \tp t ) is supported 
by a basis state (t mod n + 1, k,£) in the support of the measurement outcome with probability 
1/2, and is similarly in the support of the measurement outcome 1 with probability 1/2, so each 
such xt is uniformly distributed on {0, 1}. Moreover, by Claim [91 for t = n (mod n + 1), \ipt) is 
supported on a basis state of the form (n,b,£) for b = ©j/g^^x t _ n+ iy t i ( mQ( j n +i)es x t'- Thus, 
by construction, \tpt+i) is supported by a basis state of the form (0, b, £) with probability 1 — r\ and 
of the form (0,b © 1,£) with probability rj; since these correspond to measurements xt = b with 
probability 1 — 7? and xt = b © 1 with probability rj, we see that every (n + 1) symbols of the output 
of this Quantum Generator are distributed precisely according to the distribution of independent 
random labeled examples of fs with noise rate ry, as desired. 

Hardness of learning a parity distribution: Suppose that we could efficiently learn the 
output distribution of this Quantum Generator. In particular, for any desired e we can therefore 
efficiently learn a circuit E such that KL{Ps\\E) < e(l — H(i])), where H is the binary entropy 
function. For this circuit E, observe that if E(x, fs(%)) < E(x, ^fs( x )), then it is easy to verify by 
elementary calculus (the minimum is achieved at E(x, fs(x)) = E(x, ->fs(x))) that x contributes 

1 ( , 1 , . 1 \ 1 . , 1 . 

2 n V E(x,-<fs(x)) E(x,fs{x))J 2 n E(x) 

to KL(Ps\\E). On the rest of the distribution, E certainly encodes Ps no better than the optimal 
encoding for Ps, so we find that if more than a e fraction of x satisfy E(x, fs(%)) < E{x, —>fs(z)), 
then 

KL{P S \\E) > e(l + n) + (1 - e){H{ V ) + n) - {H{r,) + n) = e(l - H{n)) 

contradicting our assumption about the KL-divergence of E from Ps- Therefore we find that, for a 
uniformly chosen x E {0, l} n , the circuit E' that outputs b iff E(x, b) > E(x, -*b) correctly predicts 
fs(x) with probability at least 1 — e. This simple modification of E can be output efficiently, 
contradicting the assumed hardness of learning noisy parities. 
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